Overview

Dataset statistics

Number of variables32
Number of observations38659
Missing cells457949
Missing cells (%)37.0%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory9.4 MiB
Average record size in memory256.0 B

Variable types

Numeric14
Text6
Categorical11
Unsupported1

Alerts

domains_count has constant value ""Constant
Dataset has 1 (< 0.1%) duplicate rowsDuplicates
CHROM is highly overall correlated with seq_region_name and 3 other fieldsHigh correlation
POS is highly overall correlated with AF and 2 other fieldsHigh correlation
1000gp3_eur_af is highly overall correlated with clinvar_id and 10 other fieldsHigh correlation
clinpred_rankscore is highly overall correlated with mutationassessor_rankscore and 4 other fieldsHigh correlation
clinvar_id is highly overall correlated with 1000gp3_eur_af and 5 other fieldsHigh correlation
gnomad_exomes_non_cancer_nfe_af is highly overall correlated with 1000gp3_eur_af and 10 other fieldsHigh correlation
mutationassessor_rankscore is highly overall correlated with clinpred_rankscore and 5 other fieldsHigh correlation
mutationtaster_converted_rankscore is highly overall correlated with clinpred_pred and 1 other fieldsHigh correlation
polyphen2_hdiv_rankscore is highly overall correlated with clinpred_rankscore and 5 other fieldsHigh correlation
sift_converted_rankscore is highly overall correlated with 1000gp3_eur_af and 8 other fieldsHigh correlation
pubmed_count is highly overall correlated with 1000gp3_eur_af and 4 other fieldsHigh correlation
frequencies_af is highly overall correlated with 1000gp3_eur_af and 8 other fieldsHigh correlation
frequencies_gnomadg_nfe is highly overall correlated with 1000gp3_eur_af and 5 other fieldsHigh correlation
seq_region_name is highly overall correlated with CHROM and 3 other fieldsHigh correlation
AF is highly overall correlated with POS and 1 other fieldsHigh correlation
GENEINFO is highly overall correlated with CHROM and 7 other fieldsHigh correlation
TISSUE is highly overall correlated with clinpred_rankscore and 2 other fieldsHigh correlation
CTYPE is highly overall correlated with CHROM and 7 other fieldsHigh correlation
GT is highly overall correlated with 1000gp3_eur_af and 2 other fieldsHigh correlation
clinpred_pred is highly overall correlated with clinpred_rankscore and 7 other fieldsHigh correlation
strand is highly overall correlated with CHROM and 8 other fieldsHigh correlation
clin_sig_allele is highly overall correlated with clinpred_predHigh correlation
variant_class is highly overall correlated with 1000gp3_eur_af and 8 other fieldsHigh correlation
AF is highly imbalanced (98.0%)Imbalance
TISSUE is highly imbalanced (59.5%)Imbalance
clinpred_pred is highly imbalanced (71.3%)Imbalance
clin_sig_allele is highly imbalanced (90.8%)Imbalance
variant_class is highly imbalanced (65.6%)Imbalance
RIS. is highly imbalanced (97.1%)Imbalance
AF has 17793 (46.0%) missing valuesMissing
GENEINFO has 17793 (46.0%) missing valuesMissing
1000gp3_eur_af has 30399 (78.6%) missing valuesMissing
clinpred_pred has 29208 (75.6%) missing valuesMissing
clinpred_rankscore has 29208 (75.6%) missing valuesMissing
clinvar_id has 29549 (76.4%) missing valuesMissing
domains_count has 18486 (47.8%) missing valuesMissing
gnomad_exomes_non_cancer_nfe_af has 30037 (77.7%) missing valuesMissing
mutationassessor_rankscore has 33020 (85.4%) missing valuesMissing
mutationtaster_converted_rankscore has 28988 (75.0%) missing valuesMissing
polyphen2_hdiv_rankscore has 32981 (85.3%) missing valuesMissing
sift_converted_rankscore has 29211 (75.6%) missing valuesMissing
strand has 1973 (5.1%) missing valuesMissing
sift_score has 28997 (75.0%) missing valuesMissing
hgvsc has 2144 (5.5%) missing valuesMissing
clin_sig_allele has 8565 (22.2%) missing valuesMissing
pubmed_count has 28072 (72.6%) missing valuesMissing
frequencies has 38659 (100.0%) missing valuesMissing
frequencies_af has 9481 (24.5%) missing valuesMissing
frequencies_gnomadg_nfe has 9481 (24.5%) missing valuesMissing
variant_class has 1952 (5.0%) missing valuesMissing
seq_region_name has 1952 (5.0%) missing valuesMissing
frequencies is an unsupported type, check if it needs cleaning or further analysisUnsupported
clinpred_rankscore has 955 (2.5%) zerosZeros

Reproduction

Analysis started2023-11-24 10:28:18.923735
Analysis finished2023-11-24 10:28:46.173317
Duration27.25 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

CHROM
Real number (ℝ)

HIGH CORRELATION 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.114928
Minimum1
Maximum22
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size302.1 KiB
2023-11-24T11:28:46.257570image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q111
median13
Q317
95-th percentile17
Maximum22
Range21
Interquartile range (IQR)6

Descriptive statistics

Standard deviation5.1385038
Coefficient of variation (CV)0.42414646
Kurtosis-0.44466683
Mean12.114928
Median Absolute Deviation (MAD)4
Skewness-0.7315642
Sum468351
Variance26.404221
MonotonicityNot monotonic
2023-11-24T11:28:46.376168image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
13 13602
35.2%
17 11058
28.6%
2 3130
 
8.1%
11 2717
 
7.0%
3 1840
 
4.8%
7 1571
 
4.1%
5 1471
 
3.8%
16 850
 
2.2%
22 635
 
1.6%
8 479
 
1.2%
Other values (4) 1306
 
3.4%
ValueCountFrequency (%)
1 222
 
0.6%
2 3130
 
8.1%
3 1840
 
4.8%
4 376
 
1.0%
5 1471
 
3.8%
7 1571
 
4.1%
8 479
 
1.2%
10 340
 
0.9%
11 2717
 
7.0%
13 13602
35.2%
ValueCountFrequency (%)
22 635
 
1.6%
19 368
 
1.0%
17 11058
28.6%
16 850
 
2.2%
13 13602
35.2%
11 2717
 
7.0%
10 340
 
0.9%
8 479
 
1.2%
7 1571
 
4.1%
5 1471
 
3.8%

POS
Real number (ℝ)

HIGH CORRELATION 

Distinct3431
Distinct (%)8.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean55601948
Minimum1206466
Maximum2.1567462 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size302.1 KiB
2023-11-24T11:28:46.521863image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum1206466
5-th percentile7578645
Q132913055
median41223094
Q348023115
95-th percentile1.7892243 × 108
Maximum2.1567462 × 108
Range2.1446815 × 108
Interquartile range (IQR)15110060

Descriptive statistics

Standard deviation45465822
Coefficient of variation (CV)0.81770195
Kurtosis4.238488
Mean55601948
Median Absolute Deviation (MAD)8308089
Skewness2.134477
Sum2.1495157 × 1012
Variance2.0671409 × 1015
MonotonicityNot monotonic
2023-11-24T11:28:46.676369image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
32913055 1722
 
4.5%
32915005 1722
 
4.5%
32929387 1720
 
4.4%
32936646 1239
 
3.2%
41244936 991
 
2.6%
41223094 962
 
2.5%
41234470 958
 
2.5%
41244000 955
 
2.5%
41245466 955
 
2.5%
41244435 953
 
2.5%
Other values (3421) 26482
68.5%
ValueCountFrequency (%)
1206466 1
 
< 0.1%
1206566 1
 
< 0.1%
1207176 1
 
< 0.1%
1207238 19
< 0.1%
1207280 7
 
< 0.1%
1218219 9
< 0.1%
1218523 9
< 0.1%
1218587 4
 
< 0.1%
1218596 7
 
< 0.1%
1219129 15
< 0.1%
ValueCountFrequency (%)
215674619 5
 
< 0.1%
215674445 1
 
< 0.1%
215674436 47
0.1%
215674376 3
 
< 0.1%
215674371 41
0.1%
215674341 43
0.1%
215674323 43
0.1%
215674224 30
0.1%
215674090 43
0.1%
215673948 6
 
< 0.1%

REF
Text

Distinct295
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size302.1 KiB
2023-11-24T11:28:46.872085image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length75
Median length1
Mean length1.5186632
Min length1

Characters and Unicode

Total characters58710
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique97 ?
Unique (%)0.3%

Sample

1st rowG
2nd rowT
3rd rowA
4th rowG
5th rowT
ValueCountFrequency (%)
t 10985
28.4%
a 9939
25.7%
g 8878
23.0%
c 4894
12.7%
tt 407
 
1.1%
aa 187
 
0.5%
taa 80
 
0.2%
ttg 79
 
0.2%
at 67
 
0.2%
ctttttttttttttttttt 61
 
0.2%
Other values (285) 3082
 
8.0%
2023-11-24T11:28:47.224051image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
T 21678
36.9%
A 18962
32.3%
G 10819
18.4%
C 7251
 
12.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 58710
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 21678
36.9%
A 18962
32.3%
G 10819
18.4%
C 7251
 
12.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 58710
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 21678
36.9%
A 18962
32.3%
G 10819
18.4%
C 7251
 
12.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 58710
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
T 21678
36.9%
A 18962
32.3%
G 10819
18.4%
C 7251
 
12.4%

ALT
Text

Distinct199
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size302.1 KiB
2023-11-24T11:28:47.400130image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length342
Median length1
Mean length1.3797046
Min length1

Characters and Unicode

Total characters53338
Distinct characters10
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique56 ?
Unique (%)0.1%

Sample

1st rowA
2nd rowC
3rd rowG
4th rowC
5th rowC
ValueCountFrequency (%)
c 13739
35.5%
g 9290
24.0%
a 7539
19.5%
t 5828
15.1%
tt 114
 
0.3%
cacac 75
 
0.2%
ag 68
 
0.2%
tat 60
 
0.2%
tta 57
 
0.1%
atttttttttt 51
 
0.1%
Other values (189) 1838
 
4.8%
2023-11-24T11:28:47.758893image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 15861
29.7%
A 13146
24.6%
T 12926
24.2%
G 11300
21.2%
N 100
 
0.2%
< 1
 
< 0.1%
D 1
 
< 0.1%
E 1
 
< 0.1%
L 1
 
< 0.1%
> 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 53336
> 99.9%
Math Symbol 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 15861
29.7%
A 13146
24.6%
T 12926
24.2%
G 11300
21.2%
N 100
 
0.2%
D 1
 
< 0.1%
E 1
 
< 0.1%
L 1
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
< 1
50.0%
> 1
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 53336
> 99.9%
Common 2
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 15861
29.7%
A 13146
24.6%
T 12926
24.2%
G 11300
21.2%
N 100
 
0.2%
D 1
 
< 0.1%
E 1
 
< 0.1%
L 1
 
< 0.1%
Common
ValueCountFrequency (%)
< 1
50.0%
> 1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 53338
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 15861
29.7%
A 13146
24.6%
T 12926
24.2%
G 11300
21.2%
N 100
 
0.2%
< 1
 
< 0.1%
D 1
 
< 0.1%
E 1
 
< 0.1%
L 1
 
< 0.1%
> 1
 
< 0.1%

AF
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing17793
Missing (%)46.0%
Memory size302.1 KiB
0.0
20827 
1.0
 
39

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters62598
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0 20827
53.9%
1.0 39
 
0.1%
(Missing) 17793
46.0%

Length

2023-11-24T11:28:47.905417image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T11:28:48.042173image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0.0 20827
99.8%
1.0 39
 
0.2%

Most occurring characters

ValueCountFrequency (%)
0 41693
66.6%
. 20866
33.3%
1 39
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 41732
66.7%
Other Punctuation 20866
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 41693
99.9%
1 39
 
0.1%
Other Punctuation
ValueCountFrequency (%)
. 20866
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 62598
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 41693
66.6%
. 20866
33.3%
1 39
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 62598
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 41693
66.6%
. 20866
33.3%
1 39
 
0.1%

GENEINFO
Categorical

HIGH CORRELATION  MISSING 

Distinct4
Distinct (%)< 0.1%
Missing17793
Missing (%)46.0%
Memory size302.1 KiB
BRCA2
12293 
BRCA1
8074 
BRCA2:675
 
308
BRCA1:672
 
191

Length

Max length9
Median length5
Mean length5.095658
Min length5

Characters and Unicode

Total characters106326
Distinct characters10
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBRCA2
2nd rowBRCA2
3rd rowBRCA2
4th rowBRCA2
5th rowBRCA2

Common Values

ValueCountFrequency (%)
BRCA2 12293
31.8%
BRCA1 8074
20.9%
BRCA2:675 308
 
0.8%
BRCA1:672 191
 
0.5%
(Missing) 17793
46.0%

Length

2023-11-24T11:28:48.171086image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T11:28:48.298614image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
brca2 12293
58.9%
brca1 8074
38.7%
brca2:675 308
 
1.5%
brca1:672 191
 
0.9%

Most occurring characters

ValueCountFrequency (%)
B 20866
19.6%
R 20866
19.6%
C 20866
19.6%
A 20866
19.6%
2 12792
12.0%
1 8265
 
7.8%
: 499
 
0.5%
6 499
 
0.5%
7 499
 
0.5%
5 308
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 83464
78.5%
Decimal Number 22363
 
21.0%
Other Punctuation 499
 
0.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 12792
57.2%
1 8265
37.0%
6 499
 
2.2%
7 499
 
2.2%
5 308
 
1.4%
Uppercase Letter
ValueCountFrequency (%)
B 20866
25.0%
R 20866
25.0%
C 20866
25.0%
A 20866
25.0%
Other Punctuation
ValueCountFrequency (%)
: 499
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 83464
78.5%
Common 22862
 
21.5%

Most frequent character per script

Common
ValueCountFrequency (%)
2 12792
56.0%
1 8265
36.2%
: 499
 
2.2%
6 499
 
2.2%
7 499
 
2.2%
5 308
 
1.3%
Latin
ValueCountFrequency (%)
B 20866
25.0%
R 20866
25.0%
C 20866
25.0%
A 20866
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 106326
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B 20866
19.6%
R 20866
19.6%
C 20866
19.6%
A 20866
19.6%
2 12792
12.0%
1 8265
 
7.8%
: 499
 
0.5%
6 499
 
0.5%
7 499
 
0.5%
5 308
 
0.3%

NAME
Text

Distinct1722
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Memory size302.1 KiB
2023-11-24T11:28:48.454709image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length10
Median length10
Mean length8.907085
Min length6

Characters and Unicode

Total characters344339
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBRCA24/19
2nd rowBRCA24/19
3rd rowBRCA24/19
4th rowBRCA24/19
5th rowBRCA24/19
ValueCountFrequency (%)
hc68/19 506
 
1.3%
hc1/19 500
 
1.3%
brca290/21 486
 
1.3%
brca174/21 464
 
1.2%
hc10/19 442
 
1.1%
hc65/19 441
 
1.1%
hc1/22 441
 
1.1%
brca37/21 438
 
1.1%
hc100/19 435
 
1.1%
hc101/19 430
 
1.1%
Other values (1712) 34076
88.1%
2023-11-24T11:28:48.779676image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 41635
12.1%
2 40431
11.7%
C 38659
11.2%
/ 38659
11.2%
B 26147
7.6%
R 26147
7.6%
A 26147
7.6%
9 22739
6.6%
0 18233
 
5.3%
H 12512
 
3.6%
Other values (6) 53030
15.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 176068
51.1%
Uppercase Letter 129612
37.6%
Other Punctuation 38659
 
11.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 41635
23.6%
2 40431
23.0%
9 22739
12.9%
0 18233
10.4%
3 11030
 
6.3%
4 10277
 
5.8%
6 9112
 
5.2%
5 8539
 
4.8%
7 7899
 
4.5%
8 6173
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
C 38659
29.8%
B 26147
20.2%
R 26147
20.2%
A 26147
20.2%
H 12512
 
9.7%
Other Punctuation
ValueCountFrequency (%)
/ 38659
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 214727
62.4%
Latin 129612
37.6%

Most frequent character per script

Common
ValueCountFrequency (%)
1 41635
19.4%
2 40431
18.8%
/ 38659
18.0%
9 22739
10.6%
0 18233
8.5%
3 11030
 
5.1%
4 10277
 
4.8%
6 9112
 
4.2%
5 8539
 
4.0%
7 7899
 
3.7%
Latin
ValueCountFrequency (%)
C 38659
29.8%
B 26147
20.2%
R 26147
20.2%
A 26147
20.2%
H 12512
 
9.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 344339
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 41635
12.1%
2 40431
11.7%
C 38659
11.2%
/ 38659
11.2%
B 26147
7.6%
R 26147
7.6%
A 26147
7.6%
9 22739
6.6%
0 18233
 
5.3%
H 12512
 
3.6%
Other values (6) 53030
15.4%

TISSUE
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size302.1 KiB
GERMLINE
35535 
SOMATIC
 
3124

Length

Max length8
Median length8
Mean length7.9191909
Min length7

Characters and Unicode

Total characters306148
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGERMLINE
2nd rowGERMLINE
3rd rowGERMLINE
4th rowGERMLINE
5th rowGERMLINE

Common Values

ValueCountFrequency (%)
GERMLINE 35535
91.9%
SOMATIC 3124
 
8.1%

Length

2023-11-24T11:28:48.934739image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T11:28:49.049276image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
germline 35535
91.9%
somatic 3124
 
8.1%

Most occurring characters

ValueCountFrequency (%)
E 71070
23.2%
M 38659
12.6%
I 38659
12.6%
G 35535
11.6%
R 35535
11.6%
L 35535
11.6%
N 35535
11.6%
S 3124
 
1.0%
O 3124
 
1.0%
A 3124
 
1.0%
Other values (2) 6248
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 306148
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 71070
23.2%
M 38659
12.6%
I 38659
12.6%
G 35535
11.6%
R 35535
11.6%
L 35535
11.6%
N 35535
11.6%
S 3124
 
1.0%
O 3124
 
1.0%
A 3124
 
1.0%
Other values (2) 6248
 
2.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 306148
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 71070
23.2%
M 38659
12.6%
I 38659
12.6%
G 35535
11.6%
R 35535
11.6%
L 35535
11.6%
N 35535
11.6%
S 3124
 
1.0%
O 3124
 
1.0%
A 3124
 
1.0%
Other values (2) 6248
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 306148
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 71070
23.2%
M 38659
12.6%
I 38659
12.6%
G 35535
11.6%
R 35535
11.6%
L 35535
11.6%
N 35535
11.6%
S 3124
 
1.0%
O 3124
 
1.0%
A 3124
 
1.0%
Other values (2) 6248
 
2.0%

CTYPE
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size302.1 KiB
BRCA
26147 
HC
12512 

Length

Max length4
Median length4
Mean length3.3526992
Min length2

Characters and Unicode

Total characters129612
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBRCA
2nd rowBRCA
3rd rowBRCA
4th rowBRCA
5th rowBRCA

Common Values

ValueCountFrequency (%)
BRCA 26147
67.6%
HC 12512
32.4%

Length

2023-11-24T11:28:49.179810image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T11:28:49.301111image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
brca 26147
67.6%
hc 12512
32.4%

Most occurring characters

ValueCountFrequency (%)
C 38659
29.8%
B 26147
20.2%
R 26147
20.2%
A 26147
20.2%
H 12512
 
9.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 129612
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 38659
29.8%
B 26147
20.2%
R 26147
20.2%
A 26147
20.2%
H 12512
 
9.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 129612
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 38659
29.8%
B 26147
20.2%
R 26147
20.2%
A 26147
20.2%
H 12512
 
9.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 129612
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 38659
29.8%
B 26147
20.2%
R 26147
20.2%
A 26147
20.2%
H 12512
 
9.7%

GT
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size302.1 KiB
0/1
24055 
1/1
12226 
0/0
 
2378

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters115977
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0/1
2nd row0/1
3rd row1/1
4th row1/1
5th row1/1

Common Values

ValueCountFrequency (%)
0/1 24055
62.2%
1/1 12226
31.6%
0/0 2378
 
6.2%

Length

2023-11-24T11:28:49.414652image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T11:28:49.526185image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0/1 24055
62.2%
1/1 12226
31.6%
0/0 2378
 
6.2%

Most occurring characters

ValueCountFrequency (%)
1 48507
41.8%
/ 38659
33.3%
0 28811
24.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 77318
66.7%
Other Punctuation 38659
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 48507
62.7%
0 28811
37.3%
Other Punctuation
ValueCountFrequency (%)
/ 38659
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 115977
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 48507
41.8%
/ 38659
33.3%
0 28811
24.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 115977
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 48507
41.8%
/ 38659
33.3%
0 28811
24.8%

1000gp3_eur_af
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct55
Distinct (%)0.7%
Missing30399
Missing (%)78.6%
Infinite0
Infinite (%)0.0%
Mean0.44026266
Minimum0
Maximum0.99900596
Zeros92
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size302.1 KiB
2023-11-24T11:28:49.661830image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.024850895
Q10.29522863
median0.35685885
Q30.36282306
95-th percentile0.99900596
Maximum0.99900596
Range0.99900596
Interquartile range (IQR)0.067594433

Descriptive statistics

Standard deviation0.31685814
Coefficient of variation (CV)0.71970251
Kurtosis-0.5032036
Mean0.44026266
Median Absolute Deviation (MAD)0.061630219
Skewness0.79913224
Sum3636.5696
Variance0.10039908
MonotonicityNot monotonic
2023-11-24T11:28:49.815434image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.9990059642 1720
 
4.4%
0.3628230616 991
 
2.6%
0.3598409543 962
 
2.5%
0.3548707753 955
 
2.5%
0.3568588469 953
 
2.5%
0.2952286282 827
 
2.1%
0.08349900596 270
 
0.7%
0.05964214712 257
 
0.7%
0.03479125249 255
 
0.7%
0.02882703777 113
 
0.3%
Other values (45) 957
 
2.5%
(Missing) 30399
78.6%
ValueCountFrequency (%)
0 92
0.2%
0.0009940357853 64
0.2%
0.001988071571 15
 
< 0.1%
0.002982107356 60
0.2%
0.003976143141 3
 
< 0.1%
0.004970178926 11
 
< 0.1%
0.005964214712 3
 
< 0.1%
0.006958250497 6
 
< 0.1%
0.007952286282 28
 
0.1%
0.008946322068 2
 
< 0.1%
ValueCountFrequency (%)
0.9990059642 1720
4.4%
0.8767395626 45
 
0.1%
0.7654075547 45
 
0.1%
0.7147117296 43
 
0.1%
0.6411530815 41
 
0.1%
0.5616302187 40
 
0.1%
0.4691848907 33
 
0.1%
0.4373757455 31
 
0.1%
0.3946322068 30
 
0.1%
0.3787276342 24
 
0.1%

clinpred_pred
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing29208
Missing (%)75.6%
Memory size302.1 KiB
T
8976 
D
 
475

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters9451
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowT
2nd rowT
3rd rowT
4th rowT
5th rowT

Common Values

ValueCountFrequency (%)
T 8976
 
23.2%
D 475
 
1.2%
(Missing) 29208
75.6%

Length

2023-11-24T11:28:49.949386image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T11:28:50.045992image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
t 8976
95.0%
d 475
 
5.0%

Most occurring characters

ValueCountFrequency (%)
T 8976
95.0%
D 475
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 9451
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 8976
95.0%
D 475
 
5.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9451
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 8976
95.0%
D 475
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9451
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
T 8976
95.0%
D 475
 
5.0%

clinpred_rankscore
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct1007
Distinct (%)10.7%
Missing29208
Missing (%)75.6%
Infinite0
Infinite (%)0.0%
Mean0.043587135
Minimum0
Maximum0.95599
Zeros955
Zeros (%)2.5%
Negative0
Negative (%)0.0%
Memory size302.1 KiB
2023-11-24T11:28:50.166587image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.00024
median0.00078
Q30.02075
95-th percentile0.33286
Maximum0.95599
Range0.95599
Interquartile range (IQR)0.02051

Descriptive statistics

Standard deviation0.13955438
Coefficient of variation (CV)3.2017333
Kurtosis16.493927
Mean0.043587135
Median Absolute Deviation (MAD)0.00064
Skewness4.0443231
Sum411.94201
Variance0.019475425
MonotonicityNot monotonic
2023-11-24T11:28:50.325230image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.00085 1721
 
4.5%
0.00026 991
 
2.6%
0.00024 962
 
2.5%
0 955
 
2.5%
0.02075 953
 
2.5%
0.00012 827
 
2.1%
0.00059 383
 
1.0%
0.02536 257
 
0.7%
9 × 10-5150
 
0.4%
0.00142 127
 
0.3%
Other values (997) 2125
 
5.5%
(Missing) 29208
75.6%
ValueCountFrequency (%)
0 955
2.5%
1 × 10-531
 
0.1%
2 × 10-56
 
< 0.1%
4 × 10-55
 
< 0.1%
5 × 10-526
 
0.1%
9 × 10-5150
 
0.4%
0.0001 8
 
< 0.1%
0.00012 827
2.1%
0.00014 13
 
< 0.1%
0.00016 45
 
0.1%
ValueCountFrequency (%)
0.95599 7
< 0.1%
0.95503 1
 
< 0.1%
0.94592 1
 
< 0.1%
0.92979 1
 
< 0.1%
0.92389 1
 
< 0.1%
0.91716 1
 
< 0.1%
0.90962 1
 
< 0.1%
0.90576 4
< 0.1%
0.90554 1
 
< 0.1%
0.90116 1
 
< 0.1%

clinvar_id
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct725
Distinct (%)8.0%
Missing29549
Missing (%)76.4%
Infinite0
Infinite (%)0.0%
Mean88995.646
Minimum829
Maximum1332623
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size302.1 KiB
2023-11-24T11:28:50.480390image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum829
5-th percentile9329
Q141808
median41818
Q3133738
95-th percentile183700
Maximum1332623
Range1331794
Interquartile range (IQR)91930

Descriptive statistics

Standard deviation144642.29
Coefficient of variation (CV)1.6252738
Kurtosis26.778679
Mean88995.646
Median Absolute Deviation (MAD)251
Skewness4.8594024
Sum8.1075033 × 108
Variance2.0921393 × 1010
MonotonicityNot monotonic
2023-11-24T11:28:50.644593image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
133738 1720
 
4.4%
41812 991
 
2.6%
41827 962
 
2.5%
41818 955
 
2.5%
41815 953
 
2.5%
9329 827
 
2.1%
41808 270
 
0.7%
41803 257
 
0.7%
41545 128
 
0.3%
41567 127
 
0.3%
Other values (715) 1920
 
5.0%
(Missing) 29549
76.4%
ValueCountFrequency (%)
829 2
 
< 0.1%
1762 3
 
< 0.1%
3048 1
 
< 0.1%
5294 1
 
< 0.1%
8045 3
 
< 0.1%
9329 827
2.1%
9347 1
 
< 0.1%
12351 43
 
0.1%
17661 2
 
< 0.1%
17670 113
 
0.3%
ValueCountFrequency (%)
1332623 1
 
< 0.1%
1319574 2
 
< 0.1%
1319570 2
 
< 0.1%
1312623 1
 
< 0.1%
1309096 1
 
< 0.1%
1166234 24
0.1%
1131690 1
 
< 0.1%
1064262 1
 
< 0.1%
1059432 1
 
< 0.1%
1056420 1
 
< 0.1%

domains_count
Categorical

CONSTANT  MISSING 

Distinct1
Distinct (%)< 0.1%
Missing18486
Missing (%)47.8%
Memory size302.1 KiB
2.0
20173 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters60519
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row2.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.0 20173
52.2%
(Missing) 18486
47.8%

Length

2023-11-24T11:28:50.784187image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T11:28:50.888788image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
2.0 20173
100.0%

Most occurring characters

ValueCountFrequency (%)
2 20173
33.3%
. 20173
33.3%
0 20173
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 40346
66.7%
Other Punctuation 20173
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 20173
50.0%
0 20173
50.0%
Other Punctuation
ValueCountFrequency (%)
. 20173
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 60519
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 20173
33.3%
. 20173
33.3%
0 20173
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 60519
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 20173
33.3%
. 20173
33.3%
0 20173
33.3%

gnomad_exomes_non_cancer_nfe_af
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct272
Distinct (%)3.2%
Missing30037
Missing (%)77.7%
Infinite0
Infinite (%)0.0%
Mean0.40685234
Minimum0
Maximum0.999707
Zeros113
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size302.1 KiB
2023-11-24T11:28:51.014394image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.000489793
Q10.278914
median0.325862
Q30.334428
95-th percentile0.999707
Maximum0.999707
Range0.999707
Interquartile range (IQR)0.055514

Descriptive statistics

Standard deviation0.32595217
Coefficient of variation (CV)0.80115594
Kurtosis-0.40957203
Mean0.40685234
Median Absolute Deviation (MAD)0.046948
Skewness0.89932934
Sum3507.8809
Variance0.10624482
MonotonicityNot monotonic
2023-11-24T11:28:51.167570image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.999707 1720
 
4.4%
0.334428 991
 
2.6%
0.326518 962
 
2.5%
0.324908 955
 
2.5%
0.325862 953
 
2.5%
0.278914 827
 
2.1%
0.0777997 270
 
0.7%
0.0645935 257
 
0.7%
0.0348801 128
 
0.3%
0.034771 127
 
0.3%
Other values (262) 1432
 
3.7%
(Missing) 30037
77.7%
ValueCountFrequency (%)
0 113
0.3%
9.73312 × 10-61
 
< 0.1%
9.73539 × 10-62
 
< 0.1%
9.73634 × 10-61
 
< 0.1%
9.73672 × 10-61
 
< 0.1%
9.73691 × 10-62
 
< 0.1%
9.73786 × 10-61
 
< 0.1%
9.73881 × 10-61
 
< 0.1%
9.73975 × 10-63
 
< 0.1%
9.74051 × 10-61
 
< 0.1%
ValueCountFrequency (%)
0.999707 1720
4.4%
0.850549 45
 
0.1%
0.770919 45
 
0.1%
0.738026 43
 
0.1%
0.617268 41
 
0.1%
0.587903 40
 
0.1%
0.435408 33
 
0.1%
0.416586 31
 
0.1%
0.403692 24
 
0.1%
0.377731 30
 
0.1%

mutationassessor_rankscore
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct266
Distinct (%)4.7%
Missing33020
Missing (%)85.4%
Infinite0
Infinite (%)0.0%
Mean0.32885815
Minimum4 × 10-5
Maximum0.98483
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size302.1 KiB
2023-11-24T11:28:51.318228image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum4 × 10-5
5-th percentile4 × 10-5
Q10.02676
median0.33814
Q30.64647
95-th percentile0.90961
Maximum0.98483
Range0.98479
Interquartile range (IQR)0.61971

Descriptive statistics

Standard deviation0.30098866
Coefficient of variation (CV)0.91525378
Kurtosis-1.3213004
Mean0.32885815
Median Absolute Deviation (MAD)0.30833
Skewness0.2783291
Sum1854.4311
Variance0.090594173
MonotonicityNot monotonic
2023-11-24T11:28:51.474403image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4 × 10-5991
 
2.6%
0.48678 963
 
2.5%
0.02676 955
 
2.5%
0.64647 953
 
2.5%
0.11182 279
 
0.7%
0.92174 257
 
0.7%
0.55503 113
 
0.3%
0.25572 63
 
0.2%
0.09039 56
 
0.1%
0.01383 45
 
0.1%
Other values (256) 964
 
2.5%
(Missing) 33020
85.4%
ValueCountFrequency (%)
4 × 10-5991
2.6%
0.00015 2
 
< 0.1%
0.00021 1
 
< 0.1%
0.00063 2
 
< 0.1%
0.00086 1
 
< 0.1%
0.00254 45
 
0.1%
0.00541 1
 
< 0.1%
0.00573 13
 
< 0.1%
0.00597 36
 
0.1%
0.00812 22
 
0.1%
ValueCountFrequency (%)
0.98483 2
< 0.1%
0.98424 3
< 0.1%
0.97262 1
 
< 0.1%
0.96783 1
 
< 0.1%
0.95518 1
 
< 0.1%
0.95291 1
 
< 0.1%
0.94976 1
 
< 0.1%
0.94936 1
 
< 0.1%
0.94485 3
< 0.1%
0.94442 1
 
< 0.1%

mutationtaster_converted_rankscore
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct270
Distinct (%)2.8%
Missing28988
Missing (%)75.0%
Infinite0
Infinite (%)0.0%
Mean0.20540653
Minimum0.08975
Maximum0.81001
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size302.1 KiB
2023-11-24T11:28:51.839875image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0.08975
5-th percentile0.08975
Q10.08975
median0.08975
Q30.22811
95-th percentile0.81001
Maximum0.81001
Range0.72026
Interquartile range (IQR)0.13836

Descriptive statistics

Standard deviation0.2179582
Coefficient of variation (CV)1.0611065
Kurtosis1.3831495
Mean0.20540653
Median Absolute Deviation (MAD)0
Skewness1.6917373
Sum1986.4866
Variance0.047505775
MonotonicityNot monotonic
2023-11-24T11:28:51.998497image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.08975 7042
 
18.2%
0.58761 1086
 
2.8%
0.81001 526
 
1.4%
0.25265 257
 
0.7%
0.22811 113
 
0.3%
0.25075 45
 
0.1%
0.27532 45
 
0.1%
0.18612 31
 
0.1%
0.20638 25
 
0.1%
0.23243 20
 
0.1%
Other values (260) 481
 
1.2%
(Missing) 28988
75.0%
ValueCountFrequency (%)
0.08975 7042
18.2%
0.18198 10
 
< 0.1%
0.18612 31
 
0.1%
0.18878 7
 
< 0.1%
0.19072 11
 
< 0.1%
0.19238 1
 
< 0.1%
0.19486 1
 
< 0.1%
0.19599 1
 
< 0.1%
0.19853 1
 
< 0.1%
0.19925 1
 
< 0.1%
ValueCountFrequency (%)
0.81001 526
1.4%
0.58761 1086
2.8%
0.54805 2
 
< 0.1%
0.53665 3
 
< 0.1%
0.52935 14
 
< 0.1%
0.52396 6
 
< 0.1%
0.51968 14
 
< 0.1%
0.51612 3
 
< 0.1%
0.51308 2
 
< 0.1%
0.51042 1
 
< 0.1%

polyphen2_hdiv_rankscore
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct213
Distinct (%)3.8%
Missing32981
Missing (%)85.3%
Infinite0
Infinite (%)0.0%
Mean0.26620307
Minimum0.02946
Maximum0.90584
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size302.1 KiB
2023-11-24T11:28:52.157698image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0.02946
5-th percentile0.02946
Q10.02946
median0.28547
Q30.52359
95-th percentile0.7322
Maximum0.90584
Range0.87638
Interquartile range (IQR)0.49413

Descriptive statistics

Standard deviation0.24119701
Coefficient of variation (CV)0.90606397
Kurtosis-0.82086029
Mean0.26620307
Median Absolute Deviation (MAD)0.23812
Skewness0.57671055
Sum1511.5011
Variance0.058176
MonotonicityNot monotonic
2023-11-24T11:28:52.304303image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.02946 2146
 
5.6%
0.31319 973
 
2.5%
0.52359 953
 
2.5%
0.07471 315
 
0.8%
0.7322 271
 
0.7%
0.57829 119
 
0.3%
0.90584 74
 
0.2%
0.28547 56
 
0.1%
0.2013 43
 
0.1%
0.43117 43
 
0.1%
Other values (203) 685
 
1.8%
(Missing) 32981
85.3%
ValueCountFrequency (%)
0.02946 2146
5.6%
0.07471 315
 
0.8%
0.09854 37
 
0.1%
0.11197 4
 
< 0.1%
0.12183 2
 
< 0.1%
0.12996 3
 
< 0.1%
0.13644 33
 
0.1%
0.14184 2
 
< 0.1%
0.14655 15
 
< 0.1%
0.15093 3
 
< 0.1%
ValueCountFrequency (%)
0.90584 74
 
0.2%
0.77913 30
 
0.1%
0.7322 271
0.7%
0.70673 3
 
< 0.1%
0.68779 18
 
< 0.1%
0.67487 4
 
< 0.1%
0.66517 2
 
< 0.1%
0.65571 4
 
< 0.1%
0.6407 1
 
< 0.1%
0.63424 3
 
< 0.1%

sift_converted_rankscore
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct352
Distinct (%)3.7%
Missing29211
Missing (%)75.6%
Infinite0
Infinite (%)0.0%
Mean0.24909625
Minimum0.00964
Maximum0.91255
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size302.1 KiB
2023-11-24T11:28:52.450499image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0.00964
5-th percentile0.00964
Q10.00964
median0.24955
Q30.44694
95-th percentile0.72154
Maximum0.91255
Range0.90291
Interquartile range (IQR)0.4373

Descriptive statistics

Standard deviation0.24998814
Coefficient of variation (CV)1.0035805
Kurtosis-0.62151011
Mean0.24909625
Median Absolute Deviation (MAD)0.23991
Skewness0.62751686
Sum2353.4613
Variance0.062494069
MonotonicityNot monotonic
2023-11-24T11:28:52.597102image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.00964 3943
 
10.2%
0.44694 971
 
2.5%
0.46129 958
 
2.5%
0.25768 828
 
2.1%
0.63226 283
 
0.7%
0.5553 273
 
0.7%
0.91255 204
 
0.5%
0.72154 184
 
0.5%
0.35349 119
 
0.3%
0.7849 115
 
0.3%
Other values (342) 1570
 
4.1%
(Missing) 29211
75.6%
ValueCountFrequency (%)
0.00964 3943
10.2%
0.02084 1
 
< 0.1%
0.02176 1
 
< 0.1%
0.02228 1
 
< 0.1%
0.02239 1
 
< 0.1%
0.02292 26
 
0.1%
0.02407 1
 
< 0.1%
0.02782 1
 
< 0.1%
0.02803 31
 
0.1%
0.02832 1
 
< 0.1%
ValueCountFrequency (%)
0.91255 204
0.5%
0.7849 115
0.3%
0.72154 184
0.5%
0.68238 51
 
0.1%
0.65419 25
 
0.1%
0.63226 283
0.7%
0.61437 11
 
< 0.1%
0.59928 20
 
0.1%
0.58626 9
 
< 0.1%
0.5748 5
 
< 0.1%

strand
Categorical

HIGH CORRELATION  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing1973
Missing (%)5.1%
Memory size302.1 KiB
1.0
21162 
-1.0
15524 

Length

Max length4
Median length3
Mean length3.4231587
Min length3

Characters and Unicode

Total characters125582
Distinct characters4
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0 21162
54.7%
-1.0 15524
40.2%
(Missing) 1973
 
5.1%

Length

2023-11-24T11:28:52.731744image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T11:28:52.840929image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
1.0 36686
100.0%

Most occurring characters

ValueCountFrequency (%)
1 36686
29.2%
. 36686
29.2%
0 36686
29.2%
- 15524
12.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 73372
58.4%
Other Punctuation 36686
29.2%
Dash Punctuation 15524
 
12.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 36686
50.0%
0 36686
50.0%
Other Punctuation
ValueCountFrequency (%)
. 36686
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 15524
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 125582
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 36686
29.2%
. 36686
29.2%
0 36686
29.2%
- 15524
12.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 125582
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 36686
29.2%
. 36686
29.2%
0 36686
29.2%
- 15524
12.4%

sift_score
Text

MISSING 

Distinct93
Distinct (%)1.0%
Missing28997
Missing (%)75.0%
Memory size302.1 KiB
2023-11-24T11:28:52.957566image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length27
Median length4
Mean length2.6549369
Min length1

Characters and Unicode

Total characters25652
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)0.2%

Sample

1st row1.0
2nd row0.04
3rd row1.0
4th row0.08
5th row1.0
ValueCountFrequency (%)
1 4082
42.2%
0.08 996
 
10.3%
0.09 987
 
10.2%
0.04 898
 
9.3%
0 377
 
3.9%
0.01 377
 
3.9%
0.05 338
 
3.5%
214
 
2.2%
0.03 177
 
1.8%
0.16 129
 
1.3%
Other values (75) 1087
 
11.3%
2023-11-24T11:28:53.235800image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 9270
36.1%
. 5782
22.5%
1 5117
19.9%
4 1168
 
4.6%
9 1076
 
4.2%
8 1046
 
4.1%
, 579
 
2.3%
5 487
 
1.9%
3 416
 
1.6%
2 352
 
1.4%
Other values (2) 359
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 19291
75.2%
Other Punctuation 6361
 
24.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 9270
48.1%
1 5117
26.5%
4 1168
 
6.1%
9 1076
 
5.6%
8 1046
 
5.4%
5 487
 
2.5%
3 416
 
2.2%
2 352
 
1.8%
6 256
 
1.3%
7 103
 
0.5%
Other Punctuation
ValueCountFrequency (%)
. 5782
90.9%
, 579
 
9.1%

Most occurring scripts

ValueCountFrequency (%)
Common 25652
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 9270
36.1%
. 5782
22.5%
1 5117
19.9%
4 1168
 
4.6%
9 1076
 
4.2%
8 1046
 
4.1%
, 579
 
2.3%
5 487
 
1.9%
3 416
 
1.6%
2 352
 
1.4%
Other values (2) 359
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 25652
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 9270
36.1%
. 5782
22.5%
1 5117
19.9%
4 1168
 
4.6%
9 1076
 
4.2%
8 1046
 
4.1%
, 579
 
2.3%
5 487
 
1.9%
3 416
 
1.6%
2 352
 
1.4%
Other values (2) 359
 
1.4%

hgvsc
Text

MISSING 

Distinct3315
Distinct (%)9.1%
Missing2144
Missing (%)5.5%
Memory size302.1 KiB
2023-11-24T11:28:53.396940image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length89
Median length27
Mean length28.903957
Min length24

Characters and Unicode

Total characters1055428
Distinct characters33
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1902 ?
Unique (%)5.2%

Sample

1st rowENST00000380152.8:c.-26G>A
2nd rowENST00000380152.8:c.3807T>C
3rd rowENST00000380152.8:c.4563A>G
4th rowENST00000380152.8:c.6513G>C
5th rowENST00000380152.8:c.7397T>C
ValueCountFrequency (%)
enst00000380152.8:c.4563a>g 1722
 
4.7%
enst00000380152.8:c.6513g>c 1721
 
4.7%
enst00000380152.8:c.7397t>c 1720
 
4.7%
enst00000380152.8:c.7806-14t>c 1239
 
3.4%
enst00000357654.9:c.2612c>t 991
 
2.7%
enst00000357654.9:c.4837a>g 962
 
2.6%
enst00000357654.9:c.4308t>c 958
 
2.6%
enst00000357654.9:c.2082c>t 955
 
2.6%
enst00000357654.9:c.3548a>g 955
 
2.6%
enst00000357654.9:c.2311t>c 953
 
2.6%
Other values (3305) 24339
66.7%
2023-11-24T11:28:53.717760image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 215877
20.5%
. 73030
 
6.9%
3 61357
 
5.8%
5 54032
 
5.1%
1 52529
 
5.0%
T 52121
 
4.9%
8 49104
 
4.7%
2 48400
 
4.6%
4 39892
 
3.8%
6 38095
 
3.6%
Other values (23) 370991
35.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 627821
59.5%
Uppercase Letter 215808
 
20.4%
Other Punctuation 110167
 
10.4%
Lowercase Letter 47309
 
4.5%
Math Symbol 40950
 
3.9%
Dash Punctuation 10635
 
1.0%
Connector Punctuation 2738
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 215877
34.4%
3 61357
 
9.8%
5 54032
 
8.6%
1 52529
 
8.4%
8 49104
 
7.8%
2 48400
 
7.7%
4 39892
 
6.4%
6 38095
 
6.1%
7 35814
 
5.7%
9 32721
 
5.2%
Lowercase Letter
ValueCountFrequency (%)
c 36515
77.2%
d 2757
 
5.8%
e 2633
 
5.6%
l 2633
 
5.6%
i 841
 
1.8%
n 841
 
1.8%
s 841
 
1.8%
u 124
 
0.3%
p 124
 
0.3%
Uppercase Letter
ValueCountFrequency (%)
T 52121
24.2%
E 36515
16.9%
N 36515
16.9%
S 36515
16.9%
G 18492
 
8.6%
C 18134
 
8.4%
A 17516
 
8.1%
Other Punctuation
ValueCountFrequency (%)
. 73030
66.3%
: 36515
33.1%
* 622
 
0.6%
Math Symbol
ValueCountFrequency (%)
> 32917
80.4%
+ 8033
 
19.6%
Dash Punctuation
ValueCountFrequency (%)
- 10635
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2738
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 792311
75.1%
Latin 263117
 
24.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 215877
27.2%
. 73030
 
9.2%
3 61357
 
7.7%
5 54032
 
6.8%
1 52529
 
6.6%
8 49104
 
6.2%
2 48400
 
6.1%
4 39892
 
5.0%
6 38095
 
4.8%
: 36515
 
4.6%
Other values (7) 123480
15.6%
Latin
ValueCountFrequency (%)
T 52121
19.8%
E 36515
13.9%
N 36515
13.9%
S 36515
13.9%
c 36515
13.9%
G 18492
 
7.0%
C 18134
 
6.9%
A 17516
 
6.7%
d 2757
 
1.0%
e 2633
 
1.0%
Other values (6) 5404
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1055428
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 215877
20.5%
. 73030
 
6.9%
3 61357
 
5.8%
5 54032
 
5.1%
1 52529
 
5.0%
T 52121
 
4.9%
8 49104
 
4.7%
2 48400
 
4.6%
4 39892
 
3.8%
6 38095
 
3.6%
Other values (23) 370991
35.2%

clin_sig_allele
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct6
Distinct (%)< 0.1%
Missing8565
Missing (%)22.2%
Memory size302.1 KiB
NEG
29115 
VUS
 
744
POS
 
209
A:risk_factor;A:benign
 
22
G:risk_factor;G:benign;G:benign/likely_benign;G:likely_benign
 
3

Length

Max length95
Median length3
Mean length3.0227288
Min length3

Characters and Unicode

Total characters90966
Distinct characters30
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowNEG
2nd rowNEG
3rd rowNEG
4th rowNEG
5th rowNEG

Common Values

ValueCountFrequency (%)
NEG 29115
75.3%
VUS 744
 
1.9%
POS 209
 
0.5%
A:risk_factor;A:benign 22
 
0.1%
G:risk_factor;G:benign;G:benign/likely_benign;G:likely_benign 3
 
< 0.1%
T:uncertain_significance;G:risk_factor;G:benign/likely_benign;G:uncertain_significance;G:benign 1
 
< 0.1%
(Missing) 8565
 
22.2%

Length

2023-11-24T11:28:53.868953image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T11:28:53.987557image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
neg 29115
96.7%
vus 744
 
2.5%
pos 209
 
0.7%
a:risk_factor;a:benign 22
 
0.1%
g:risk_factor;g:benign;g:benign/likely_benign;g:likely_benign 3
 
< 0.1%
t:uncertain_significance;g:risk_factor;g:benign/likely_benign;g:uncertain_significance;g:benign 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
G 29131
32.0%
N 29115
32.0%
E 29115
32.0%
S 953
 
1.0%
V 744
 
0.8%
U 744
 
0.8%
P 209
 
0.2%
O 209
 
0.2%
n 82
 
0.1%
i 78
 
0.1%
Other values (20) 586
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 90265
99.2%
Lowercase Letter 566
 
0.6%
Other Punctuation 100
 
0.1%
Connector Punctuation 35
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 82
14.5%
i 78
13.8%
r 54
9.5%
e 48
8.5%
g 39
 
6.9%
b 37
 
6.5%
k 33
 
5.8%
c 32
 
5.7%
a 30
 
5.3%
f 28
 
4.9%
Other values (6) 105
18.6%
Uppercase Letter
ValueCountFrequency (%)
G 29131
32.3%
N 29115
32.3%
E 29115
32.3%
S 953
 
1.1%
V 744
 
0.8%
U 744
 
0.8%
P 209
 
0.2%
O 209
 
0.2%
A 44
 
< 0.1%
T 1
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
: 61
61.0%
; 35
35.0%
/ 4
 
4.0%
Connector Punctuation
ValueCountFrequency (%)
_ 35
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 90831
99.9%
Common 135
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
G 29131
32.1%
N 29115
32.1%
E 29115
32.1%
S 953
 
1.0%
V 744
 
0.8%
U 744
 
0.8%
P 209
 
0.2%
O 209
 
0.2%
n 82
 
0.1%
i 78
 
0.1%
Other values (16) 451
 
0.5%
Common
ValueCountFrequency (%)
: 61
45.2%
_ 35
25.9%
; 35
25.9%
/ 4
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90966
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
G 29131
32.0%
N 29115
32.0%
E 29115
32.0%
S 953
 
1.0%
V 744
 
0.8%
U 744
 
0.8%
P 209
 
0.2%
O 209
 
0.2%
n 82
 
0.1%
i 78
 
0.1%
Other values (20) 586
 
0.6%

pubmed_count
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct58
Distinct (%)0.5%
Missing28072
Missing (%)72.6%
Infinite0
Infinite (%)0.0%
Mean52.084727
Minimum1
Maximum114
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size302.1 KiB
2023-11-24T11:28:54.161415image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q128
median34
Q388
95-th percentile114
Maximum114
Range113
Interquartile range (IQR)60

Descriptive statistics

Standard deviation39.864674
Coefficient of variation (CV)0.76538127
Kurtosis-1.3048763
Mean52.084727
Median Absolute Deviation (MAD)32
Skewness0.35859194
Sum551421
Variance1589.1922
MonotonicityNot monotonic
2023-11-24T11:28:54.308114image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
114 2007
 
5.2%
76 1002
 
2.6%
34 1000
 
2.6%
28 960
 
2.5%
88 953
 
2.5%
33 953
 
2.5%
1 720
 
1.9%
2 372
 
1.0%
3 360
 
0.9%
41 274
 
0.7%
Other values (48) 1986
 
5.1%
(Missing) 28072
72.6%
ValueCountFrequency (%)
1 720
1.9%
2 372
1.0%
3 360
0.9%
4 99
 
0.3%
5 152
 
0.4%
6 143
 
0.4%
7 98
 
0.3%
8 47
 
0.1%
9 43
 
0.1%
10 9
 
< 0.1%
ValueCountFrequency (%)
114 2007
5.2%
98 4
 
< 0.1%
88 953
2.5%
87 2
 
< 0.1%
85 22
 
0.1%
82 1
 
< 0.1%
80 257
 
0.7%
76 1002
2.6%
60 3
 
< 0.1%
59 8
 
< 0.1%

frequencies
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing38659
Missing (%)100.0%
Memory size302.1 KiB

frequencies_af
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct597
Distinct (%)2.0%
Missing9481
Missing (%)24.5%
Infinite0
Infinite (%)0.0%
Mean0.47556009
Minimum0.0002
Maximum1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size302.1 KiB
2023-11-24T11:28:54.457385image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0.0002
5-th percentile0.0218
Q10.2494
median0.3526
Q30.7188
95-th percentile0.9758
Maximum1
Range0.9998
Interquartile range (IQR)0.4694

Descriptive statistics

Standard deviation0.31156372
Coefficient of variation (CV)0.65515111
Kurtosis-1.0056511
Mean0.47556009
Median Absolute Deviation (MAD)0.1823
Skewness0.46408428
Sum13875.892
Variance0.097071953
MonotonicityNot monotonic
2023-11-24T11:28:54.614020image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.974 1805
 
4.7%
0.9736 1721
 
4.5%
0.9758 1720
 
4.4%
0.5315 1239
 
3.2%
0.3526 1013
 
2.6%
0.5439 991
 
2.6%
0.3365 984
 
2.5%
0.3558 962
 
2.5%
0.3363 958
 
2.5%
0.3353 953
 
2.5%
Other values (587) 16832
43.5%
(Missing) 9481
24.5%
ValueCountFrequency (%)
0.0002 137
0.4%
0.0004 73
0.2%
0.0006 64
0.2%
0.0008 30
 
0.1%
0.001 24
 
0.1%
0.0012 37
 
0.1%
0.0014 14
 
< 0.1%
0.0016 50
 
0.1%
0.0018 4
 
< 0.1%
0.002 14
 
< 0.1%
ValueCountFrequency (%)
1 105
0.3%
0.9996 1
 
< 0.1%
0.9992 94
0.2%
0.999 34
 
0.1%
0.9986 87
0.2%
0.997 47
0.1%
0.9944 47
0.1%
0.9844 33
 
0.1%
0.9824 43
0.1%
0.9768 33
 
0.1%

frequencies_gnomadg_nfe
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct892
Distinct (%)3.1%
Missing9481
Missing (%)24.5%
Infinite0
Infinite (%)0.0%
Mean0.46965932
Minimum0
Maximum1
Zeros46
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size302.1 KiB
2023-11-24T11:28:54.766285image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.0343695
Q10.2764
median0.3321
Q30.663
95-th percentile0.9997
Maximum1
Range1
Interquartile range (IQR)0.3866

Descriptive statistics

Standard deviation0.31598939
Coefficient of variation (CV)0.67280553
Kurtosis-0.85395747
Mean0.46965932
Median Absolute Deviation (MAD)0.1461
Skewness0.62176552
Sum13703.72
Variance0.099849296
MonotonicityNot monotonic
2023-11-24T11:28:54.922915image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.9997 5417
 
14.0%
0.521 1239
 
3.2%
0.3301 1013
 
2.6%
0.3317 1006
 
2.6%
0.3407 991
 
2.6%
0.3313 982
 
2.5%
0.3322 962
 
2.5%
0.3302 955
 
2.5%
0.3295 953
 
2.5%
0.3165 880
 
2.3%
Other values (882) 14780
38.2%
(Missing) 9481
24.5%
ValueCountFrequency (%)
0 46
0.1%
1.47 × 10-513
 
< 0.1%
1.471 × 10-53
 
< 0.1%
1.473 × 10-51
 
< 0.1%
1.475 × 10-51
 
< 0.1%
2.94 × 10-510
 
< 0.1%
2.941 × 10-51
 
< 0.1%
2.942 × 10-51
 
< 0.1%
2.964 × 10-52
 
< 0.1%
2.975 × 10-55
 
< 0.1%
ValueCountFrequency (%)
1 129
 
0.3%
0.9999 58
 
0.2%
0.9998 33
 
0.1%
0.9997 5417
14.0%
0.9995 47
 
0.1%
0.9946 47
 
0.1%
0.993 47
 
0.1%
0.989 40
 
0.1%
0.9863 47
 
0.1%
0.9852 47
 
0.1%

variant_class
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing1952
Missing (%)5.0%
Memory size302.1 KiB
SNV
33105 
deletion
 
2637
insertion
 
965

Length

Max length9
Median length3
Mean length3.5169314
Min length3

Characters and Unicode

Total characters129096
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSNV
2nd rowSNV
3rd rowSNV
4th rowSNV
5th rowSNV

Common Values

ValueCountFrequency (%)
SNV 33105
85.6%
deletion 2637
 
6.8%
insertion 965
 
2.5%
(Missing) 1952
 
5.0%

Length

2023-11-24T11:28:55.070179image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T11:28:55.185801image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
snv 33105
90.2%
deletion 2637
 
7.2%
insertion 965
 
2.6%

Most occurring characters

ValueCountFrequency (%)
S 33105
25.6%
N 33105
25.6%
V 33105
25.6%
e 6239
 
4.8%
i 4567
 
3.5%
n 4567
 
3.5%
t 3602
 
2.8%
o 3602
 
2.8%
d 2637
 
2.0%
l 2637
 
2.0%
Other values (2) 1930
 
1.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 99315
76.9%
Lowercase Letter 29781
 
23.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 6239
20.9%
i 4567
15.3%
n 4567
15.3%
t 3602
12.1%
o 3602
12.1%
d 2637
8.9%
l 2637
8.9%
s 965
 
3.2%
r 965
 
3.2%
Uppercase Letter
ValueCountFrequency (%)
S 33105
33.3%
N 33105
33.3%
V 33105
33.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 129096
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 33105
25.6%
N 33105
25.6%
V 33105
25.6%
e 6239
 
4.8%
i 4567
 
3.5%
n 4567
 
3.5%
t 3602
 
2.8%
o 3602
 
2.8%
d 2637
 
2.0%
l 2637
 
2.0%
Other values (2) 1930
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 129096
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 33105
25.6%
N 33105
25.6%
V 33105
25.6%
e 6239
 
4.8%
i 4567
 
3.5%
n 4567
 
3.5%
t 3602
 
2.8%
o 3602
 
2.8%
d 2637
 
2.0%
l 2637
 
2.0%
Other values (2) 1930
 
1.5%

seq_region_name
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct14
Distinct (%)< 0.1%
Missing1952
Missing (%)5.0%
Infinite0
Infinite (%)0.0%
Mean12.282317
Minimum1
Maximum22
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size302.1 KiB
2023-11-24T11:28:55.291430image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q111
median13
Q317
95-th percentile17
Maximum22
Range21
Interquartile range (IQR)6

Descriptive statistics

Standard deviation5.069982
Coefficient of variation (CV)0.41278711
Kurtosis-0.27768666
Mean12.282317
Median Absolute Deviation (MAD)4
Skewness-0.81667676
Sum450847
Variance25.704717
MonotonicityNot monotonic
2023-11-24T11:28:55.408007image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
13 13595
35.2%
17 10865
28.1%
2 2930
 
7.6%
11 2329
 
6.0%
3 1652
 
4.3%
5 1311
 
3.4%
7 1010
 
2.6%
16 717
 
1.9%
22 566
 
1.5%
8 473
 
1.2%
Other values (4) 1259
 
3.3%
(Missing) 1952
 
5.0%
ValueCountFrequency (%)
1 206
 
0.5%
2 2930
 
7.6%
3 1652
 
4.3%
4 370
 
1.0%
5 1311
 
3.4%
7 1010
 
2.6%
8 473
 
1.2%
10 336
 
0.9%
11 2329
 
6.0%
13 13595
35.2%
ValueCountFrequency (%)
22 566
 
1.5%
19 347
 
0.9%
17 10865
28.1%
16 717
 
1.9%
13 13595
35.2%
11 2329
 
6.0%
10 336
 
0.9%
8 473
 
1.2%
7 1010
 
2.6%
5 1311
 
3.4%

MSP
Text

Distinct1671
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Memory size302.1 KiB
2023-11-24T11:28:55.564260image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters270613
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row828944H
2nd row828944H
3rd row828944H
4th row828944H
5th row828944H
ValueCountFrequency (%)
e656391 523
 
1.4%
803127n 500
 
1.3%
266865y 486
 
1.3%
a435161 470
 
1.2%
246950c 464
 
1.2%
810713f 449
 
1.2%
906063n 443
 
1.1%
892831l 441
 
1.1%
327452s 441
 
1.1%
195336c 438
 
1.1%
Other values (1661) 34004
88.0%
2023-11-24T11:28:55.855155image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8 28183
10.4%
9 28074
10.4%
2 27084
10.0%
1 24279
9.0%
3 23197
8.6%
7 22564
8.3%
6 21216
7.8%
5 20453
7.6%
0 19985
7.4%
4 18734
6.9%
Other values (24) 36844
13.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 233769
86.4%
Uppercase Letter 36838
 
13.6%
Lowercase Letter 6
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
L 3182
 
8.6%
T 2727
 
7.4%
C 2247
 
6.1%
N 2101
 
5.7%
P 2030
 
5.5%
H 1973
 
5.4%
U 1965
 
5.3%
S 1778
 
4.8%
K 1708
 
4.6%
Y 1644
 
4.5%
Other values (13) 15483
42.0%
Decimal Number
ValueCountFrequency (%)
8 28183
12.1%
9 28074
12.0%
2 27084
11.6%
1 24279
10.4%
3 23197
9.9%
7 22564
9.7%
6 21216
9.1%
5 20453
8.7%
0 19985
8.5%
4 18734
8.0%
Lowercase Letter
ValueCountFrequency (%)
y 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 233769
86.4%
Latin 36844
 
13.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
L 3182
 
8.6%
T 2727
 
7.4%
C 2247
 
6.1%
N 2101
 
5.7%
P 2030
 
5.5%
H 1973
 
5.4%
U 1965
 
5.3%
S 1778
 
4.8%
K 1708
 
4.6%
Y 1644
 
4.5%
Other values (14) 15489
42.0%
Common
ValueCountFrequency (%)
8 28183
12.1%
9 28074
12.0%
2 27084
11.6%
1 24279
10.4%
3 23197
9.9%
7 22564
9.7%
6 21216
9.1%
5 20453
8.7%
0 19985
8.5%
4 18734
8.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 270613
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
8 28183
10.4%
9 28074
10.4%
2 27084
10.0%
1 24279
9.0%
3 23197
8.6%
7 22564
8.3%
6 21216
7.8%
5 20453
7.6%
0 19985
7.4%
4 18734
6.9%
Other values (24) 36844
13.6%

RIS.
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size302.1 KiB
NEG
38436 
VUS
 
126
POS
 
86
POS VUS
 
11

Length

Max length8
Median length3
Mean length3.0014227
Min length3

Characters and Unicode

Total characters116032
Distinct characters9
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNEG
2nd rowNEG
3rd rowNEG
4th rowNEG
5th rowNEG

Common Values

ValueCountFrequency (%)
NEG 38436
99.4%
VUS 126
 
0.3%
POS 86
 
0.2%
POS VUS 11
 
< 0.1%

Length

2023-11-24T11:28:56.005790image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-24T11:28:56.116407image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
neg 38436
99.4%
vus 137
 
0.4%
pos 97
 
0.3%

Most occurring characters

ValueCountFrequency (%)
N 38436
33.1%
E 38436
33.1%
G 38436
33.1%
S 234
 
0.2%
V 137
 
0.1%
U 137
 
0.1%
P 97
 
0.1%
O 97
 
0.1%
22
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 116010
> 99.9%
Control 22
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 38436
33.1%
E 38436
33.1%
G 38436
33.1%
S 234
 
0.2%
V 137
 
0.1%
U 137
 
0.1%
P 97
 
0.1%
O 97
 
0.1%
Control
ValueCountFrequency (%)
22
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 116010
> 99.9%
Common 22
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 38436
33.1%
E 38436
33.1%
G 38436
33.1%
S 234
 
0.2%
V 137
 
0.1%
U 137
 
0.1%
P 97
 
0.1%
O 97
 
0.1%
Common
ValueCountFrequency (%)
22
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 116032
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 38436
33.1%
E 38436
33.1%
G 38436
33.1%
S 234
 
0.2%
V 137
 
0.1%
U 137
 
0.1%
P 97
 
0.1%
O 97
 
0.1%
22
 
< 0.1%

Interactions

2023-11-24T11:28:42.950312image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:22.871205image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:24.330430image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:25.838078image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:27.515890image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:29.033659image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:30.586368image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:32.089286image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:33.571649image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:35.340849image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:36.786314image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:38.253603image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:39.758371image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:41.443010image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:43.047919image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:22.975822image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:24.432053image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:25.950683image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:27.617469image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:29.137264image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:30.687970image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:32.189923image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:33.872510image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:35.440465image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:36.886966image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:38.356213image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:39.858946image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:41.545594image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:43.148544image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:23.077483image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:24.530615image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:26.053276image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:27.719072image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:29.241854image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:30.793327image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:32.297533image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:33.989305image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:35.540073image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:36.989568image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:38.461902image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:39.962562image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:41.650190image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:43.257158image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:23.186085image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:24.635968image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:26.158852image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:27.828677image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:29.350425image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:30.901466image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:32.399496image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:34.100940image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:35.641696image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:37.094170image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:38.576427image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:40.066141image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:41.754779image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:43.360764image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:23.289688image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:24.739587image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:26.270474image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:27.932259image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:29.465677image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:31.005113image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:32.508589image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:34.213112image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:35.746379image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:37.199814image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:38.692045image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:40.378989image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:41.871367image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:43.471415image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:23.397873image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:24.848195image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:26.382080image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:28.042844image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:29.607749image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:31.116706image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:32.620244image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:34.329708image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:35.854906image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:37.308475image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:38.798701image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:40.489595image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:41.981977image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:43.575998image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:23.503479image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:24.958781image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:26.487740image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:28.147451image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:29.717266image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:31.223329image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:32.724868image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:34.445315image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:35.955536image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:37.414647image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:38.912303image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:40.596176image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:42.091537image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:43.681598image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:23.607093image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:25.081377image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:26.592346image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:28.255044image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:29.827196image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:31.329946image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:32.831469image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:34.559929image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:36.061143image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:37.521874image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:39.020716image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:40.701293image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:42.196160image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:43.795211image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:23.720689image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:25.206799image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:26.707560image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:28.371642image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:29.944768image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:31.448551image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:32.949075image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:34.678551image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:36.176748image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:37.636477image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:39.135384image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:40.816460image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:42.315506image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:43.896802image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:23.819327image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:25.318509image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:26.807132image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:28.475167image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:30.053006image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:31.550139image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:33.052654image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:34.788167image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:36.276355image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:37.739100image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:39.236997image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:40.919074image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:42.417600image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:43.999420image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:23.918943image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:25.420178image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:26.914089image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:28.578328image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:30.156579image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:31.654738image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:33.155269image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:34.895813image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:36.378957image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:37.837703image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:39.338615image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:41.020672image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:42.522849image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:44.104024image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:24.021576image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:25.526982image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:27.022684image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:28.690801image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:30.264119image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:31.765155image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:33.260859image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:35.008979image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:36.481544image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:37.942527image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:39.441171image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:41.128256image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:42.630480image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:44.208635image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:24.127220image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:25.632589image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:27.296146image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:28.814507image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:30.371363image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:31.876049image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:33.364432image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:35.120600image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:36.584135image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:38.047113image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:39.547171image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:41.233845image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:42.738074image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:44.317257image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:24.232848image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:25.739314image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:27.402317image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:28.931057image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:30.480906image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:31.985058image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:33.468029image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:35.234220image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:36.683734image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:38.153727image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:39.656741image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:41.341390image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
2023-11-24T11:28:42.845678image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Correlations

2023-11-24T11:28:56.228098image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
CHROMPOS1000gp3_eur_afclinpred_rankscoreclinvar_idgnomad_exomes_non_cancer_nfe_afmutationassessor_rankscoremutationtaster_converted_rankscorepolyphen2_hdiv_rankscoresift_converted_rankscorepubmed_countfrequencies_affrequencies_gnomadg_nfeseq_region_nameAFGENEINFOTISSUECTYPEGTclinpred_predstrandclin_sig_allelevariant_classRIS.
CHROM1.000-0.288-0.185-0.119-0.166-0.1470.0500.0900.0600.2010.282-0.097-0.1171.0000.0291.0000.2230.5970.2440.0990.8600.0460.1640.029
POS-0.2881.000-0.0700.114-0.057-0.028-0.0490.4030.0060.148-0.2890.028-0.011-0.2811.0001.0000.2160.5830.1610.0730.3490.1090.1720.024
1000gp3_eur_af-0.185-0.0701.0000.1860.6771.000-0.430-0.050-0.299-0.5410.8840.9701.000-0.1850.1070.5260.0530.5420.5060.0440.8830.2951.0000.031
clinpred_rankscore-0.1190.1140.1861.0000.2900.0560.5900.2130.7020.465-0.1240.0230.185-0.1190.0000.1660.5580.0630.4730.9770.1160.3951.0000.199
clinvar_id-0.166-0.0570.6770.2901.0000.5820.087-0.148-0.001-0.2350.0060.6850.676-0.1660.0000.1650.5070.3260.4560.4300.0700.3331.0000.090
gnomad_exomes_non_cancer_nfe_af-0.147-0.0281.0000.0560.5821.000-0.431-0.083-0.315-0.5490.8880.9701.000-0.1470.1080.5170.1740.5310.5360.1950.8570.2091.0000.105
mutationassessor_rankscore0.050-0.049-0.4300.5900.087-0.4311.000-0.3290.9440.837-0.212-0.693-0.4300.0500.0000.2330.2900.3630.3030.5150.2160.2501.0000.126
mutationtaster_converted_rankscore0.0900.403-0.0500.213-0.148-0.083-0.3291.000-0.1570.0320.212-0.048-0.0510.0900.0090.2180.2760.2000.2750.5980.3130.3301.0000.212
polyphen2_hdiv_rankscore0.0600.006-0.2990.702-0.001-0.3150.944-0.1571.0000.848-0.100-0.606-0.2980.0600.0000.1590.2430.4100.2490.5470.2520.2781.0000.170
sift_converted_rankscore0.2010.148-0.5410.465-0.235-0.5490.8370.0320.8481.000-0.207-0.680-0.5420.2010.0460.3440.2970.1890.3830.5540.5190.2411.0000.157
pubmed_count0.282-0.2890.884-0.1240.0060.888-0.2120.212-0.100-0.2071.0000.1640.0840.2820.0000.1440.0470.6220.1280.5540.5260.1810.0000.208
frequencies_af-0.0970.0280.9700.0230.6850.970-0.693-0.048-0.606-0.6800.1641.0000.956-0.0970.0630.5100.0700.3770.4860.0450.7040.0560.0520.045
frequencies_gnomadg_nfe-0.117-0.0111.0000.1850.6761.000-0.430-0.051-0.298-0.5420.0840.9561.000-0.1170.0630.4930.0750.4100.4890.0440.7010.0800.0530.045
seq_region_name1.000-0.281-0.185-0.119-0.166-0.1470.0500.0900.0600.2010.282-0.097-0.1171.0000.0291.0000.2150.5980.2430.0990.8600.0460.1640.028
AF0.0291.0000.1070.0000.0000.1080.0000.0090.0000.0460.0000.0630.0630.0291.0000.0290.0131.0000.0570.0000.0290.0000.0000.000
GENEINFO1.0001.0000.5260.1660.1650.5170.2330.2180.1590.3440.1440.5100.4931.0000.0291.0000.3091.0000.3170.2101.0000.1540.0590.012
TISSUE0.2230.2160.0530.5580.5070.1740.2900.2760.2430.2970.0470.0700.0750.2150.0130.3091.0000.2050.8100.4070.0300.4340.0940.041
CTYPE0.5970.5830.5420.0630.3260.5310.3630.2000.4100.1890.6220.3770.4100.5981.0001.0000.2051.0000.2080.0070.0560.0500.1710.043
GT0.2440.1610.5060.4730.4560.5360.3030.2750.2490.3830.1280.4860.4890.2430.0570.3170.8100.2081.0000.4830.1720.4290.1580.044
clinpred_pred0.0990.0730.0440.9770.4300.1950.5150.5980.5470.5540.5540.0450.0440.0990.0000.2100.4070.0070.4831.0000.0940.6231.0000.276
strand0.8600.3490.8830.1160.0700.8570.2160.3130.2520.5190.5260.7040.7010.8600.0291.0000.0300.0560.1720.0941.0000.0280.0980.020
clin_sig_allele0.0460.1090.2950.3950.3330.2090.2500.3300.2780.2410.1810.0560.0800.0460.0000.1540.4340.0500.4290.6230.0281.0000.0680.375
variant_class0.1640.1721.0001.0001.0001.0001.0001.0001.0001.0000.0000.0520.0530.1640.0000.0590.0940.1710.1581.0000.0980.0681.0000.016
RIS.0.0290.0240.0310.1990.0900.1050.1260.2120.1700.1570.2080.0450.0450.0280.0000.0120.0410.0430.0440.2760.0200.3750.0161.000

Missing values

2023-11-24T11:28:44.523476image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-11-24T11:28:45.038548image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-11-24T11:28:45.787321image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

CHROMPOSREFALTAFGENEINFONAMETISSUECTYPEGT1000gp3_eur_afclinpred_predclinpred_rankscoreclinvar_iddomains_countgnomad_exomes_non_cancer_nfe_afmutationassessor_rankscoremutationtaster_converted_rankscorepolyphen2_hdiv_rankscoresift_converted_rankscorestrandsift_scorehgvscclin_sig_allelepubmed_countfrequenciesfrequencies_affrequencies_gnomadg_nfevariant_classseq_region_nameMSPRIS.
01332890572GA0.0BRCA2BRCA24/19GERMLINEBRCA0/1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1.0NaNENST00000380152.8:c.-26G>ANEGNaNNaN0.20930.2663SNV13.0828944HNEG
11332912299TC0.0BRCA2BRCA24/19GERMLINEBRCA0/1NaNNaNNaNNaN2.0NaNNaNNaNNaNNaN1.0NaNENST00000380152.8:c.3807T>CNEGNaNNaN0.16810.1860SNV13.0828944HNEG
21332913055AG0.0BRCA2BRCA24/19GERMLINEBRCA1/1NaNNaNNaNNaN2.0NaNNaNNaNNaNNaN1.0NaNENST00000380152.8:c.4563A>GNEGNaNNaN0.97400.9997SNV13.0828944HNEG
31332915005GC0.0BRCA2BRCA24/19GERMLINEBRCA1/1NaNNaNNaNNaN2.0NaNNaNNaNNaNNaN1.0NaNENST00000380152.8:c.6513G>CNEGNaNNaN0.97360.9997SNV13.0828944HNEG
41332929387TC0.0BRCA2BRCA24/19GERMLINEBRCA1/10.999006T0.00085133738.02.00.999707NaN0.08975NaN0.009641.01.0ENST00000380152.8:c.7397T>CNEGNaNNaN0.97580.9997SNV13.0828944HNEG
51332936646TC0.0BRCA2BRCA24/19GERMLINEBRCA0/1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1.0NaNENST00000380152.8:c.7806-14T>CNEGNaNNaN0.53150.5210SNV13.0828944HNEG
61332906729AC0.0BRCA2BRCA25/19GERMLINEBRCA1/10.295229T0.000129329.02.00.278914NaN0.08975NaN0.257681.00.04ENST00000380152.8:c.1114A>CNEGNaNNaN0.24940.2764SNV13.0831734XNEG
71332913055AG0.0BRCA2BRCA25/19GERMLINEBRCA1/1NaNNaNNaNNaN2.0NaNNaNNaNNaNNaN1.0NaNENST00000380152.8:c.4563A>GNEGNaNNaN0.97400.9997SNV13.0831734XNEG
81332915005GC0.0BRCA2BRCA25/19GERMLINEBRCA1/1NaNNaNNaNNaN2.0NaNNaNNaNNaNNaN1.0NaNENST00000380152.8:c.6513G>CNEGNaNNaN0.97360.9997SNV13.0831734XNEG
91332929387TC0.0BRCA2BRCA25/19GERMLINEBRCA1/10.999006T0.00085133738.02.00.999707NaN0.08975NaN0.009641.01.0ENST00000380152.8:c.7397T>CNEGNaNNaN0.97580.9997SNV13.0831734XNEG
CHROMPOSREFALTAFGENEINFONAMETISSUECTYPEGT1000gp3_eur_afclinpred_predclinpred_rankscoreclinvar_iddomains_countgnomad_exomes_non_cancer_nfe_afmutationassessor_rankscoremutationtaster_converted_rankscorepolyphen2_hdiv_rankscoresift_converted_rankscorestrandsift_scorehgvscclin_sig_allelepubmed_countfrequenciesfrequencies_affrequencies_gnomadg_nfevariant_classseq_region_nameMSPRIS.
38649191222268AGNaNNaNBRCA84/20GERMLINEBRCA0/1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1.0NaNENST00000326873.12:c.920+263A>GNEG1.0NaN0.70750.4664SNV19.0961115FNEG
38650191226772CTNaNNaNBRCA84/20GERMLINEBRCA0/1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1.0NaNENST00000326873.12:c.*16+110C>TNEG1.0NaN0.27140.2150SNV19.0961115FNEG
38651191226901GTNaNNaNBRCA84/20GERMLINEBRCA0/1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1.0NaNENST00000326873.12:c.*16+239G>TNEGNaNNaN0.24820.2143SNV19.0961115FNEG
386522229085060CTNaNNaNBRCA84/20GERMLINEBRCA0/1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN-1.0NaNENST00000404276.6:c.1542+63G>ANaNNaNNaNNaNNaNSNV22.0961115FNEG
386532229085138GGGAGANaNNaNBRCA84/20GERMLINEBRCA0/1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN961115FNEG
386542229085168CGNaNNaNBRCA84/20GERMLINEBRCA0/1NaNNaNNaNNaN2.0NaNNaNNaNNaNNaN-1.0NaNENST00000404276.6:c.1497G>CNEGNaNNaNNaNNaNSNV22.0961115FNEG
386552229085257AGNaNNaNBRCA84/20GERMLINEBRCA0/1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN-1.0NaNENST00000404276.6:c.1462-54T>CNaNNaNNaNNaNNaNSNV22.0961115FNEG
386562229091300TCNaNNaNBRCA84/20GERMLINEBRCA0/1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN-1.0NaNENST00000404276.6:c.1260-70A>GNaNNaNNaNNaNNaNSNV22.0961115FNEG
386572229130300CTNaNNaNBRCA84/20GERMLINEBRCA1/1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN-1.0NaNENST00000404276.6:c.319+91G>ANEGNaNNaN0.27340.2807SNV22.0961115FNEG
386582229130813GAAAAAAAAAAAAAGAAAAAAAAAAAAAANaNNaNBRCA84/20GERMLINEBRCA0/1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN961115FNEG

Duplicate rows

Most frequently occurring

CHROMPOSREFALTAFGENEINFONAMETISSUECTYPEGT1000gp3_eur_afclinpred_predclinpred_rankscoreclinvar_iddomains_countgnomad_exomes_non_cancer_nfe_afmutationassessor_rankscoremutationtaster_converted_rankscorepolyphen2_hdiv_rankscoresift_converted_rankscorestrandsift_scorehgvscclin_sig_allelepubmed_countfrequencies_affrequencies_gnomadg_nfevariant_classseq_region_nameMSPRIS.# duplicates
01332893206TTT0.0BRCA2BRCA160/21SOMATICBRCA0/0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1.0NaNENST00000380152.8:c.68-4delNaNNaNNaNNaNdeletion13.0A435161NEG2